-- Exploring a simple English language parser using the Semantic DB
-- The intent is to learn triples from the given input sentences, to partly auto-construct a graph structure.
--
-- The underlying theory of this parser is we first map words to words with labelled types/parts-of-speech.
-- (we often deviate from standard parts-of-speech labels, so it is more correct to say we are labelling with types not PoS)
-- Eg: |frog> maps to |noun: frog>
-- Unfortunately we don't currently have a good approach for the common case where a word has more than one possible type.
--
-- Note, we auto-learn proper-nouns, using the property that if the first char of a word is a capital, and we don't already know its type we label it a proper noun.
-- This way we don't need to manually define proper nouns for all the possible variations of proper-nouns (which is essentially unlimited).
-- We also auto learn possesive nouns if the last two characters of a word are 's. The plural noun case is not currently handled, but might be added in later.
--
-- We also have quote and bracket layers. These extract out quoted text, and bracketed text, labelled with types |quote: > and |bracket: > respectively.
-- These are actually applied just before the map layer, and are passed through the map layer unchanged.
-- The text inside the quotes and bracket text is not processed by the parser, and passed unchanged through all the layers.
--
-- Once we have words of form: |A: alpha>, where A is a type and alpha is the corresponding word we have an extendable collection of rules that act pair wise on the input sequence.
-- Eg, |A: alpha> . |B: beta> maps to a new ket depending on the relation between the A type and the B type.
-- For example, we handle the cases where: A == B, or A < B, or A > B.
-- If there is no known relation between A and B, then we leave the kets unchanged in the given sequence.
--
-- Now for a brief description of our type relation rules:
-- If A == B, ie, they have the same type importance, then there are two possible cases:
-- The new ket takes the type of the first element, or it takes the type of the second element (the most common case is the first of these):
-- Eg, |A: alpha> . |B: beta> maps to either |A: alpha beta> or |B: alpha beta> depending on being either equal-type 1 or equal-type 2.
-- We also auto-learn the map operator applied to |alpha beta>. Eg, map |alpha beta> => |A: alpha beta>
--
-- If A < B, ie, A is less important than B, then:
-- |A: alpha> . |B: beta> maps to |B: beta> and we also learn: A|beta> +=> |alpha>
-- Due to a quirk of grammar, we actually need to apply this layer at least three times in a row.
-- It may turn out we need to apply it even more times.
-- To implement this idea we use a short-cut in our notation: op^k, which applies "op", k times in a row, to the given input.
--
-- If A > B, ie A is more important then B (this case is far less common than the previous case), then:
-- |A: alpha> . |B: beta> maps to |A: alpha> and we also learn: B|alpha> +=> |beta>
-- The current use case for this is A is type noun, and B is type bracket.
--
-- We also have a merge layer and a cast layer.
-- For example in the merge layer: number . and . number maps to type number
-- And in the cast layer: noun . comma . noun maps to type noun-list
-- And again, we auto learn the map operator on our newly generated kets.
--
-- The final layer is a small collection of if-then machines that detect grammar structures and then produces a SDB triple.
-- Eg, |*> . |is> . |*> maps to: is |alpha> +=> |gamma>
-- We also count how many times a particular if-then machine rule has been activated, using the activation-count variable.
-- The presumption is that the more times a rule has been activated, the more "grammatically correct" that rule is.
--
-- Finally, the code "dumps" the newly acquired learn rules, so we can see what we have learnt from the parsing stage.
-- And we also dump the if-then machine activation-counts.
--
-- It is not known how far this approach can be taken with real world sentences.
-- It is not known if the type importance rules are universal, and invariant regardless of a given context.
-- It is also not known if the requried order of application of our layers is invariant regardless of a given context.
-- Is is also not known if SDB triples are sufficient to represent sentences in general, or if a more general structure is required.
-- Finally, it should be mentioned that the SDB language is rather slower than would be desired!
--
-- Author: Garry Morrison
-- Created: 2023/5/1
-- Updated: 2023/5/8
|context> => |SDB simple parser>
print |The simple English Semantic DB parser:>
-- define some sentences:
learn-sentences |*> #=>
sentence |1> => |An Albert Einstein quote is " God does not play dice " #EOS#>
sentence |2> => |test ( our bracket text here ) #EOS#>
sentence |3> => |The capital city of New South Wales is Sydney #EOS#>
sentence |4> => |The shopping list is two apples , three oranges , bread and milk #EOS#>
sentence |5> => |Their example listy is one thousand two hundred and thirty two red and blue green fish , thirty three sharks , an eel and a green turtle #EOS#>
sentence |6> => |Some Einstein quotes are " X " and " Y " #EOS#>
sentence |7> => |Some Einstein quotes are " God does not play dice " and " Imagination is more important than knowledge " #EOS#>
sentence |8> => |My friends are Sam , Rob , Fred Smith and Emma #EOS#>
sentence |9> => |The capital cities of Australia are Adelaide , Brisbane , Canberra , Darwin , Hobart , Melbourne , Perth and Sydney #EOS#>
sentence |10> => |Not Fred Robert Smith #EOS#>
sentence |11> => |Our example bracket test ( our sample bracket text here ) is " just a toy quote " #EOS#>
sentence |12> => |My green slippery frog sat on a smooth brown rock #EOS#>
sentence |13> => |The tired man used a pair of binoculars #EOS#>
sentence |14> => |Fred Robert Smith's wife's brother's pet dog #EOS#>
-- now load them into memory:
learn-sentences
-- choose one/some to parse:
-- the |sentence> => sentence (|2> + |4> + |5> + |6> + |7> + |10> + |14>)
-- let's learn them all!
the |sentence> => sentence rel-kets[sentence]
-- switch verbose mode on/off:
-- verbose |mode> => |yes>
verbose |mode> => |no>
-- display our sentences operator:
display-sentences |*> #=>
print | >
print |Our sentences:>
for( the |sentence number> in rel-kets[sentence] ):
print (| > _ the |sentence number> _ |)> __ sentence the |sentence number>)
end:
print | >
-- now display them:
display-sentences
-- our parse operator:
-- parse |*> #=> learn-layer sdrop cast-types-layer greater-than-layer less-than-layer^3 merge-types-layer equality-2-layer equality-1-layer map bracket-layer quote-layer ssplit[" "] |_self>
-- parse |*> #=> learn-layer sdrop cast-types-layer less-than-layer^3 merge-types-layer greater-than-layer equality-2-layer equality-1-layer map bracket-layer quote-layer ssplit[" "] |_self>
-- parse |*> #=> learn-layer sdrop cast-types-layer less-than-layer^3 merge-types-layer greater-than-layer equality-2-layer equality-1-layer map bracket-layer quote-layer ssplit[" "] sprint["Chosen sentence: "] |_self>
parse |*> #=> learn-layer sdrop sprint["Final sequence: "] cast-types-layer less-than-layer^3 merge-types-layer greater-than-layer equality-2-layer equality-1-layer map bracket-layer quote-layer ssplit[" "] sprint["Input sentence: "] |_self>
-- our not operator:
not |no> => |yes>
not |yes> => |no>
-- our quote layer operator:
quote-layer (*) #=>
the |words> => |__self>
unlearn[the] (|result> + |quote>)
is |quote> => |no>
sfor( the |word> in the |words>):
if( the |word> == |">):
if( is |quote>):
the |result> .=> |quote> :_ smerge[" "] the |quote>
unlearn[the] |quote>
end:
is |quote> => not is |quote>
else:
if( is |quote>):
the |quote> .=> the |word>
else:
the |result> .=> the |word>
end:
end:
end:
the |result>
-- a quick test:
-- sprint parse sentence |1>
-- our bracket-layer operator:
bracket-layer (*) #=>
the |words> => |__self>
unlearn[the] (|result> + |bracket text>)
is-in |bracket> => |no>
sfor( the |word> in the |words>):
if( the |word> == |(>):
is-in |bracket> => |yes>
else:
if( the |word> == |)>):
is-in |bracket> => |no>
the |result> .=> |bracket> :_ smerge[" "] the |bracket text>
unlearn[the] |bracket text>
else:
if( is-in |bracket>):
the |bracket text> .=> the |word>
else:
the |result> .=> the |word>
end:
end:
end:
end:
the |result>
-- a quick test:
-- sprint parse sentence |2>
-----------------------------------------------------------------------------------------------------------
-- define our lists of parts of speech/types (extend as required):
list-of |determiner> => split[" "] |The the A a An an Some some This this That that>
list-of |colour> => split[" "] |red orange yellow green blue indigo violet brown>
list-of |adj> => split[" "] |hungry tired shopping example smooth slippery>
list-of |noun> => split[" "] |frog man toad elephant rock bridge list apples oranges bread milk rose listy fish sharks eel turtle capital city cities quote quotes test friends sisters brothers bracket pair binoculars pet dog>
list-of |title> => split[" "] |Mr Mrs Miss Dr Sir Madam>
-- list-of |proper-noun> => split[" "] |Fred John Smith Frank Roberts Emma Liz Edwards South Western Australia Adelaide Perth New Wales Sydney>
list-of |number> => split[" "] |one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen twenty thirty forty fifty sixty seventy eighty ninety hundred thousand million billion trillion>
list-of |token> => split[" "] |#EOS#>
list-of |whose> => split[" "] |My my Her her His his Their their Our our>
list-of |verb> => split[" "] |sat used>
list-of |preposition> => split[" "] |on>
-----------------------------------------------------------------------------------------------------------
-- function required to learn them:
apply-prefix {the|prefix>, the |words>} #=>
for( the |word> in the |words>):
map the |word> => the |prefix> :_ the |word>
end:
-- now learn them:
apply-prefix(|determiner>, list-of |determiner>)
apply-prefix(|colour>, list-of |colour>)
apply-prefix(|adj>, list-of |adj>)
apply-prefix(|noun>, list-of |noun>)
apply-prefix(|title>, list-of |title>)
-- apply-prefix(|proper-noun>, list-of |proper-noun>)
apply-prefix(|number>, list-of |number>)
apply-prefix(|token>, list-of |token>)
apply-prefix(|whose>, list-of |whose>)
apply-prefix(|verb>, list-of |verb>)
apply-prefix(|preposition>, list-of |preposition>)
-- now some special ones:
map |is> => |is: is>
map |are> => |are: are>
map |was> => |was: was>
map |and> => |and: and>
map |or> => |or: or>
map |comma> => |comma: comma>
map |,> => |comma: ,>
map |of> => |of: of>
map |not> => |not: not>
map |Not> => |not: not>
-- pass quotes and bracket text unchanged:
map |quote: *> #=> |_self>
map |bracket: *> #=> |_self>
-- learn proper nouns:
-- making use of the idea of label descent, which causes this function to only be invoked for words that don't have an existing map definition.
-- we also use memoizing rules so the pattern is stored in memory
map |*> !=>
unlearn[the] |result>
if( sselect[-2,-1] ssplit |__self> == ssplit |'s>):
the |result> => |possesive-noun> :_ |__self>
end:
if( not do-you-know the |result> && (|__self> == to-upper[1] |__self>)):
the |result> => |proper-noun> :_ |__self>
end:
if( not do-you-know the |result>):
the |result> => |__self>
end:
the |result>
-- a quick test:
-- sprint parse sentence |2>
-----------------------------------------------------------------------------------------------------------
-- now the start of the grammar (extend with more rules as required):
-- equal l relation rules:
-- |A: alpha> . |B: beta> maps to |A: alpha beta>
--
relation-1 |noun __ noun> => |equal 1>
relation-1 |proper-noun __ proper-noun> => |equal 1>
relation-1 |proper-noun __ noun> => |equal 1>
relation-1 |possesive-noun __ possesive-noun> => |equal 1>
relation-1 |adj __ adj> => |equal 1>
relation-1 |number __ number> => |equal 1>
relation-1 |colour __ colour> => |equal 1>
relation-1 |verb __ preposition> => |equal 1>
-- equal 2 relation rules:
-- |A: alpha> . |B: beta> maps to |B: alpha beta>
--
relation-2 |not __ noun> => |equal 2>
relation-2 |not __ proper-noun> => |equal 2>
relation-2 |not __ adj> => |equal 2>
relation-2 |proper-noun __ possesive-noun> => |equal 2>
-- merge types layer:
-- eg: |number: alpha> . |and: and> . |number: beta> maps to |number: alpha and beta>
--
type-1 |number __ and __ number> => |number>
type-1 |colour __ and __ colour> => |colour>
type-1 |noun __ of __ noun> => |noun>
type-1 |noun __ of __ proper-noun> => |noun>
-- less than relation rules:
-- If A < B, then |A: alpha> . |B: beta> maps to |B: beta>
-- and we learn: A |beta> +=> |alpha>
--
relation-3 |adj __ noun> => |less than>
relation-3 |determiner __ noun> => |less than>
relation-3 |determiner __ proper-noun> => |less than>
relation-3 |number __ noun> => |less than>
relation-3 |colour __ noun> => |less than>
relation-3 |whose __ noun> => |less than>
relation-3 |title __ proper-noun> => |less than>
relation-3 |possesive-noun __ noun> => |less than>
-- greater than relation rules:
-- if A > B then |A: alpha> . |B: beta> maps to |A: alpha>
-- and we learn: B|alpha> +=> |beta>
--
relation-4 |noun __ bracket> => |greater than>
-- cast types layer:
-- eg: |noun: alpha> . |comma: comma> . |noun: beta> maps to |noun-list: alpha comma beta>
--
type-2 |noun __ comma __ noun> => |noun-list>
type-2 |noun-list __ comma __ noun> => |noun-list>
type-2 |noun-list __ and __ noun> => |noun-list>
type-2 |proper-noun __ comma __ proper-noun> => |proper-noun-list>
type-2 |proper-noun-list __ comma __ proper-noun> => |proper-noun-list>
type-2 |proper-noun-list __ and __ proper-noun> => |proper-noun-list>
-- disable this one for now, and shift to the if-then machine layer:
-- type-2 |noun __ verb __ noun> => |NVN>
-- define some if-then machines for the final layer:
template |node: 1: 1> => |*> . |is> . |*>
activation-count |node: 1: 1> => |0>
then |node: 1: *> #=>
activation-count |__self> => plus[1] activation-count |__self>
is sselect[1,1] the |values> +=> sselect[-1,-1] the |values>
sselect[1,1] the |values>
template |node: 2: 1> => |*> . |was> . |*>
activation-count |node: 2: 1> => |0>
then |node: 2: *> #=>
activation-count |__self> => plus[1] activation-count |__self>
was sselect[1,1] the |values> +=> sselect[-1,-1] the |values>
sselect[1,1] the |values>
template |node: 3: 1> => |proper-noun> . |comma> . |proper-noun>
activation-count |node: 3: 1> => |0>
then |node: 3: *> #=>
activation-count |__self> => plus[1] activation-count |__self>
where sselect[1,1] the |values> +=> sselect[-1,-1] the |values>
sselect[1,1] the |values>
template |node: 4: 1> => |*> . |are> . |*> . |and> . |*>
activation-count |node: 4: 1> => |0>
then |node: 4: *> #=>
activation-count |__self> => plus[1] activation-count |__self>
are sselect[1,1] the |values> +=> sselect[3,3] the |values> + sselect[5,5] the |values>
sselect[1,1] the |values>
template |node: 5: 1> => |*> . |are> . |*>
activation-count |node: 5: 1> => |0>
then |node: 5: *> #=>
activation-count |__self> => plus[1] activation-count |__self>
are sselect[1,1] the |values> +=> sselect[3,3] the |values>
sselect[1,1] the |values>
template |node: 6: 1> => |noun> . |verb> . |noun>
activation-count |node: 6: 1> => |0>
then |node: 6: *> #=>
activation-count |__self> => plus[1] activation-count |__self>
the |op> => |op> :_ smerge["-"] ssplit[" "] sselect[2,2] the |values>
the |subject> => sselect[1,1] the |values>
the |object> => sselect[3,3] the |values>
add-learn(the |op>, the |subject>, the |object>)
the |subject>
-----------------------------------------------------------------------------------------------------------
-- now on to the implementation section:
-- the equality 1 operator:
apply-equality-1 (*) #=>
the |types> => extract-head |__self>
the |values> => extract-value |__self>
the |relation> => relation-1 smerge[" __ "] the |types>
if( verbose |mode>):
print | >
sprint["input: "] |__self>
sprint["relation: "] the |relation>
end:
if( the |relation> == |equal 1>):
the |new ket> => sselect[1,1] the |types> :_ smerge[" "] the |values>
map smerge[" "] the |values> => the |new ket>
the |new ket>
else:
sselect[1,1] |__self>
end:
equality-1-layer (*) #=>
if( verbose |mode>):
print |--------------------------->
print |equality 1 layer:>
end:
unlearn[the] |result>
the |pre> => sselect[1,1] |__self>
sfor( the |word> in sselect[2,-1] |__self>):
local |result> => apply-equality-1( the |pre> . the |word>)
if( local |result> == the |pre>):
the |result> .=> the |pre>
the |pre> => the |word>
else:
the |pre> => local |result>
end:
end:
the |result> . |> . |>
-- a quick test:
-- sprint parse sentence |1>
-- the equality 2 operator:
apply-equality-2 (*) #=>
the |types> => extract-head |__self>
the |values> => extract-value |__self>
the |relation> => relation-2 smerge[" __ "] the |types>
if( verbose |mode>):
print | >
sprint["input: "] |__self>
sprint["relation: "] the |relation>
end:
if( the |relation> == |equal 2>):
the |new ket> => sselect[2,2] the |types> :_ smerge[" "] the |values>
map smerge[" "] the |values> => the |new ket>
the |new ket>
else:
sselect[1,1] |__self>
end:
equality-2-layer (*) #=>
if( verbose |mode>):
print |--------------------------->
print |equality 2 layer:>
end:
unlearn[the] |result>
the |pre> => sselect[1,1] |__self>
sfor( the |word> in sselect[2,-1] |__self>):
local |result> => apply-equality-2( the |pre> . the |word>)
if( local |result> == the |pre>):
the |result> .=> the |pre>
the |pre> => the |word>
else:
the |pre> => local |result>
end:
end:
the |result> . |> . |>
-- the less than operator:
apply-less-than (*) #=>
the |types> => extract-head |__self>
the |values> => extract-value |__self>
the |relation> => relation-3 smerge[" __ "] the |types>
if( verbose |mode>):
print | >
sprint["input: "] |__self>
sprint["relation: "] the |relation>
end:
if( the |relation> == |less than>):
the |op> => |op> :_ sselect[1,1] the |types>
the |subject> => sselect[2,2] the |values>
the |object> => sselect[1,1] the |values>
add-learn(the |op>, the |subject>, the |object>)
list-of |updated kets> +=> the |subject>
sselect[2,2] |__self>
else:
sselect[1,1] |__self>
end:
less-than-layer (*) #=>
if( verbose |mode>):
print |--------------------------->
print |less than layer:>
end:
unlearn[the] |result>
the |pre> => sselect[1,1] |__self>
sfor( the |word> in sselect[2,-1] |__self>):
local |result> => apply-less-than( the |pre> . the |word>)
if( local |result> == the |pre>):
the |result> .=> the |pre>
the |pre> => the |word>
else:
the |pre> => local |result>
end:
end:
the |result> . |> . |>
-- a quick test:
-- sprint parse sentence |1>
-- the greater than operator:
apply-greater-than (*) #=>
the |types> => extract-head |__self>
the |values> => extract-value |__self>
the |relation> => relation-4 smerge[" __ "] the |types>
if( verbose |mode>):
print | >
sprint["input: "] |__self>
sprint["relation: "] the |relation>
end:
if( the |relation> == |greater than>):
the |op> => |op> :_ sselect[2,2] the |types>
the |subject> => sselect[1,1] the |values>
the |object> => sselect[2,2] the |values>
add-learn(the |op>, the |subject>, the |object>)
list-of |updated kets> +=> the |subject>
sselect[1,1] |__self>
else:
sselect[2,2] |__self>
end:
greater-than-layer (*) #=>
if( verbose |mode>):
print |--------------------------->
print |greater than layer:>
end:
unlearn[the] |result>
the |pre> => sselect[1,1] |__self>
the |result> => the |pre>
sfor( the |word> in sselect[2,-1] |__self>):
local |result> => apply-greater-than( the |pre> . the |word>)
if( local |result> == the |pre>):
the |pre> => the |word>
else:
the |result> .=> local |result>
the |pre> => local |result>
end:
end:
the |result> . |> . |>
-- a quick test:
-- sprint parse sentence |1>
apply-merge-types (*) #=>
the |types> => extract-head |__self>
the |values> => extract-value |__self>
the |type> => type-1 smerge[" __ "] the |types>
if( verbose |mode>):
print | >
sprint["input: "] |__self>
sprint["type: "] the |type>
end:
if( do-you-know the |type>):
the |new ket> => the |type> :_ smerge[" "] the |values>
map smerge[" "] the |values> => the |new ket>
the |new ket>
else:
sselect[1,1] |__self>
end:
merge-types-layer (*) #=>
if( verbose |mode>):
print |--------------------------->
print |merge types layer:>
end:
unlearn[the] |result>
the |pre 1> => sselect[1,1] |__self>
the |pre 2> => sselect[2,2] |__self>
sfor( the |word> in sselect[3,-1] |__self>):
local |result> => apply-merge-types( the |pre 1> . the |pre 2> . the |word>)
if( local |result> == the |pre 1>):
the |result> .=> the |pre 1>
the |pre 1> => the |pre 2>
the |pre 2> => the |word>
else:
the |pre 1> => |>
the |pre 2> => local |result>
end:
end:
sdrop the |result> . |> . |>
-- a quick test:
-- sprint parse sentence |3>
-- print | >
-- dump(|*>) rel-kets[is]
apply-cast-types (*) #=>
the |types> => extract-head |__self>
the |values> => extract-value |__self>
the |type> => type-2 smerge[" __ "] the |types>
if( verbose |mode>):
print | >
sprint["input: "] |__self>
sprint["type: "] the |type>
end:
if( do-you-know the |type>):
the |type> :_ smerge[" "] the |values>
else:
sselect[1,1] |__self>
end:
cast-types-layer (*) #=>
if( verbose |mode>):
print |--------------------------->
print |cast types layer:>
end:
unlearn[the] |result>
the |pre 1> => sselect[1,1] |__self>
the |pre 2> => sselect[2,2] |__self>
sfor( the |word> in sselect[3,-1] |__self>):
local |result> => apply-cast-types( the |pre 1> . the |pre 2> . the |word>)
if( local |result> == the |pre 1>):
the |result> .=> the |pre 1>
the |pre 1> => the |pre 2>
the |pre 2> => the |word>
else:
the |pre 1> => |>
the |pre 2> => local |result>
end:
end:
sdrop the |result> . |> . |>
-- a quick test:
-- sprint parse sentence |4>
-- print | >
-- dump(|*>) rel-kets[is]
-- the final layer, where we use if-then machines to learn some rules:
learn-layer (*) #=>
the |types> => extract-head |__self>
the |values> => extract-value |__self>
if( verbose |mode>):
print |--------------------------->
print |if-then machine learn layer:>
sprint["types: "] the |types>
sprint["values: "] the |values>
end:
list-of |updated kets> +=> then star-equal-input[template] the |types>
|__self>
-- Extract out lists of possesive nouns and proper nouns:
-- Doesn't work as initally expected.
-- list-of |possesive-noun> => filter(|op: extract-head>, |possesive-noun>) rel-kets[*]
-- list-of |proper-noun> => filter(|op: extract-head>, |proper-noun>) rel-kets[*]
-- Let's learn our inverse list-of rules:
-- Switch off for now.
-- find-inverse[list-of]
-- Now we have that all in place, let's parse our sentence(s):
parse the |sentence>
print | >
print |---------------------------------------->
print |Our new learn rules:>
print | >
dump(|*>) list-of |updated kets>
print |---------------------------------------->
print |Activation counts for our if-then machines:>
print | >
dump(|op: activation-count>) rel-kets[activation-count]
-- This is interesting to see once or twice, but not in general, so we comment it out:
-- print |---------------------------------------->
-- print |Our maps:>
-- print | >
-- dump(|op: map>) rel-kets[map]